[Salon] ChatGPT Answers Beat Physicians' on Info, Patient Empathy, Study Finds



https://www.medpagetoday.com/practicemanagement/informationtechnology/104255?xid=nl_mpt_morningbreak2023-05-01&eun=g2097601d0r

ChatGPT Answers Beat Physicians' on Info, Patient Empathy, Study Finds

— Evaluators gave chatbot the better rating for responses to patient queries by a nearly 4:1 ratio

A photo of a woman speaking to an android at a desk in an office.

The artificial intelligence (AI) chatbot ChatGPT outperformed physicians when answering patient questions, based on quality of response and empathy, according to a cross-sectional study.

Of 195 exchanges, evaluators preferred ChatGPT responses to physician responses in 78.6% (95% CI 75.0-81.8) of the 585 evaluations, reported John Ayers, PhD, MA, of the Qualcomm Institute at the University of California San Diego in La Jolla, and co-authors.

The AI chatbot responses were given a significantly higher quality rating than physician responses (t=13.3, P<0.001), with the proportion of responses rated as good or very good quality (≥4) higher for ChatGPT (78.5%) than physicians (22.1%), amounting to a 3.6 times higher prevalence of good or very good quality responses for the chatbot, they noted in JAMA Internal Medicineopens in a new tab or window.

Furthermore, ChatGPT's responses were rated as being significantly more empathetic than physician responses (t=18.9, P<0.001), with the proportion of responses rated as empathetic or very empathetic (≥4) higher for ChatGPT (45.1%) than for physicians (4.6%), amounting to a 9.8 times higher prevalence of empathetic or very empathetic responses for the chatbot.

"ChatGPT provides a better answer," Ayers told MedPage Today. "I think of our study as a phase zero study, and it clearly shows that ChatGPT wins in a landslide compared to physicians, and I wouldn't say we expected that at all."

He said they were trying to figure out how ChatGPT, developed by OpenAI, could potentially help resolve the burden of answering patient messages for physicians, which he noted is a well-documented contributor to burnout.

Ayers said that he approached this study with his focus on another population as well, pointing out that the burnout crisis might be affecting roughly 1.1 million providers across the U.S., but it is also affecting about 329 million patients who are engaging with overburdened healthcare professionals.

"There are a lot of people out there asking questions that maybe go unanswered or get bad answers. What do we do to help them?" he said. "I think AI-assisted messaging could be a game changer for public health."

He noted that AI assistant messaging could change patient outcomes, and he wants to see more studies that focus on evaluating these outcomes. He said he hopes this study will motivate more research on this use of AI because of its potential to improve productivity and free up the time of clinical staff for more complex tasks.

In an invited commentaryopens in a new tab or window, Jonathan H. Chen, MD, PhD, of Stanford University School of Medicine in Palo Alto, California, and co-authors highlighted ways for physicians to begin implementing the technology into clinical practice, such as using it to simplify text-based tasks or improve medical training, but cautioned that AI models also have the potential to exacerbate biases and produce other harms.

"Medicine is much more than just processing information and associating words with concepts; it is ascribing meaning to those concepts while connecting with patients as a trusted partner to build healthier lives," they wrote.

In an accompanying perspectiveopens in a new tab or window, Teva D. Brender, MD, of the University of California San Francisco School of Medicine, wrote that the promise of AI to ease the burden of documentation and other common, often repetitive, written tasks, should be weighed against the potential harms, such as adding to "note bloat" or exacerbating existing biases.

"Physicians will need to learn how to integrate these tools into clinical practice, defining clear boundaries between full, supervised, and proscribed autonomy," he added. "And yet, I am cautiously optimistic about a future of improved healthcare system efficiency, better patient outcomes, and reduced burnout."

After seeing the results of this study, Ayers thinks that the research community should be working on randomized controlled trials to study the effects of AI messaging, so that the future development of AI models will be able to account for patient outcomes.

"If we do the studies and if we create the incentives for patient outcomes to become the priority of AI system messaging, then we can discover those benefits and maximize those and we can discover any incidental harms and minimize those," he added. "I'm pretty optimistic about what it could do for people's health."

For this study, the researchers randomly selected 195 exchanges from the Reddit forum r/AskDocsopens in a new tab or window in October 2022 in which a verified physician responded to a public question. The researchers entered each original question into a new session of ChatGPT 3.5 in late December, with the physicians anonymized.

Each set of questions and responses were evaluated by three licensed physicians who were asked to choose "which response was better," and to judge "the quality of information provided" and "the empathy or bedside manner provided." They scored each assessment on five tier scales from "very poor" to "very good" for quality and from "not empathetic" to "very empathetic" for empathy.

Mean physician responses were significantly shorter than chatbot responses (52 words vs 211 words; t=25.4, P<0.001).

Ayers and co-authors noted several limitations to their study, including the fact that it was not designed to show how ChatGPT would perform in a clinical setting. In addition, the measures of quality and empathy were not validated, and the evaluators did not assess responses for accuracy.

  • author['full_name']

    Michael DePeau-Wilson is a reporter on MedPage Today’s enterprise & investigative team. He covers psychiatry, long covid, and infectious diseases, among other relevant U.S. clinical news. Follow



This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.